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Let  the  syntactic  variables  be  l,2,3...v  ana  words  will  be 
denoted  x,y,z,  etc.  Introduce  the  matrix 


(1)  P(x)  = { p . , ( x ) } 

where  p..  . (x)  is  the  probability  of  rewriting  i -*•  Jx,  and  the 
vector 


r(x)  = {r^x)} 


where  r„.  (x)  is  the  probability  of  rewriting  i x (see  Grenander’s 
paper  in  Neyraan  Festschrift  for  further  details  and  equation  (3) 
used  below) . 

When  we  search  for  the  syntactic  variables  it  nay  be  best  to 
organize  the  search  fren  below  following  a suggestion  by  Kerry 
Kucera.  This  means  that  we  first  try  to  group  words  into  classes 
c.,  ,0^ , . . . ,c  , then  group  classes  into  higher  level  classes  and 


so  on, 


seems  as  if  this  would  reduce  the  search  effort 


drastically  since  the  number  of  words  nir  is  much  larger  than  the 
number  of  syntactic  classes  n . 

When  we  do  this  we  have  to  proceed  by  testing  for  linguistic 
equivalence  similarly  to  method  In  paper  on  abduction  machine. 

Two  words  x and  y are  said  to  be  equivalent,  written  as  x = y , 
if  uxv  and  uyv  are  either  both  grammatical  or  both  non-grammatical 
u and  v arbitrary  lexical  strings. 

The  search  will  depend  crucially  on  how  difficult  it  is  to 
separate  x from  y by  equivalence  when  they  are  not  equivalent. 

The  trouble  is  that  when  we  test  with  u and  v,  a negative  answer 


enough  to  establish  x 2 y,  but  a positive  answer  is  no 


In 


principle  we  would  have  to  go  through  all  u and  v. 

Lemma  1 . The  statement  x e y is  the  same  as  to  say  that  P(uxv)  and 

r(u.vv)  arc  both  aero  or  not  zero , all  u and  v strings. 

Proof:  For  a given  lexical  string  S = x,  ,x„ , . . . x._  we  get  the 

-L  c.  n 

probability 


?(S)  = dP(x1)F(x?) 


) 

/ 


where  d is  the  vector  ( 1 , 0 , 0 , . . . 0)  . V.’e  know  from  our  earlier  work 
however,  that  S being  grammatical  is  the  same  as  ?(3)  being 
positive,  hence  our  statement  correct,  and  we  shall  see  that  (3) 
can  be  used  to  clear  up  the  situation  more. 

Introduce  the  function  of  two  words  x and  y 


(H) 

a(x,y) 

= max  Ujp  ,(x)-p  . (y) 

-t  ? d J 

-*•  J 

! } + max{|r.i(x)-r,  (y ) j } 

u- 

Lemma  2 . 

The  f 

unction  d is  a pseudo  : 

distance 

(a) 

a > 0 , 

d ( x , x ) = 0 

(b) 

d ( x , y ) 

= d ( y , x ) 

(c ) 

d ( x , z ) 

<_  d ( x , y ) +d  ( y , z ) . 

Proof : 

(a)  an 

d (b)  are  obvious,  '.v'e 

have 

(5)  d ( x 

,z)  = m 

ax  { Z | p ..  . ( x ) -p , , ( r. ) [ } + 

max { | J’.,  (x)-ri(z)  | } 
i. 

<_  m 

ax{E | p. . (x)-p. , (y)  | } + 

■T  ^ J -*■  J 

max  {P i o . , (y )-p . . ( z ) ' 
• ‘ ‘ ‘-I  ij  ' 

max jr . (x)-r . (y) j + max | r, (y ) -r . ( z) | } = d(x,y)+d(y 


). 


Note  however  that  d(x,y)  = 0 does  not  imply  x=y,  it  only 


means  that  P(x)  = P(y)  and  r(x)  = r(y).  But  using  Lemma  1 
this  means  that  x 2 y so  that  the  pseudo  distance  scnarates  the 
words  jn  the  dictionary  into  equivalence  classes.  Also  x = y 
does  not  imply  d(x,y)  = C.  It  is  also  clear  that 


d <_  2 since  Z I [p . . (x)+r . (x)  ] = 1. 
x .1 


We  can  now  net  a bound  on  how  difficult  it  is  to  separate 
x from  y by  the  testing  procedure  mentioned  above.  We  have 


using  (3)  for  S = x,x„,x__  , xx_  x_  and  S'  = xnx^...x_  , xx 


— V Y v 

1 2’  r-1  r+o.  n 


'12 ' 


-1  r+1 


(6)  P(S)  -P(S')  = dC?(xi)...?(xr_1)P(x)P(xr+1)...r(xn) 
-dP(x1) . . .P(xr_1)P(y)P(xr+1) . . .r(xn)l  = 

= dA[P(x)-P(y)]3r(x  ) . 


The  matrix  P(x)  has  now  bounded  by  1 since 

(7)  !| P ( x ) ||  < max  Z p .. ( x ) £ 1 

i j 

Hence  j]  A {|  and  jj  B ij  < 1 so  that 

| P(S)-P(S*  ) : < I!?(x)-P(y)  il  < d(x,y)  . 


This  was  when  v i 
sentences  end  wit 


c not  empty.  If  v is  empty,  so  that  the 
h x and  y respectively,  we  get  instead  with 


a 


similar  argument 


(S)  |r(S)-P(S»)|  < ||r(x)-r(y)|!  < d(x,y 


1 


Hence  we  have 

Theorem  1 . The  difference  in  test  nrob.abtli  t i.os  for  two  words 
x and  y is  bounded  by 


(9) 


'(S)-F(o'  ) | < d ( x , y ) . 


It  is  known  at  present  how  sharp  this  inequality  is. 

We  nov;  start  the  abduction  from  below  and  consider  the 
dictionary  D = (1,2,.  . .n,,}  , to  begin  with  consider  as  a single 
class  called  1. 

Partition  Algorithm. : After  t sentences  have  been  heard  D has 

been  partitioned  into  classes  c,  ,c^, . . . ,cr,  mutually  disjoint 
and  exhaustive.  Then  sentence  No.  t+1  is  heard,  S = uxy  one 
v/ ora  x appearing  in  it  is  picked  (systematically  or  at  random?) 
and  replaced  by  another  word  y in  the  same  class  c(x).  The 
following  action,  is  taken. 

(a)  if  S'  = uyv  is  grammatical  nothing  is  done  and  the 
algorithm  loops  to  the  next  sentence. 

(b)  if  S'  = uyv  is  not  grammatical  v is  removed  from 
class  c(x)  and  wc  move  to  (c). 

(c)  start  a new  loop  going  through,  all  the  other  classes 

c,  , c.  / c(x).  In  each  pick  a word  z (systematically  or  at 
random)  and  test  for  x z z as  before.  The  rs t time  the 

answer  is  positive  move  x to  this  class.  Otherwise  move  to  (d). 


(d)  create  a new  clac 


P+1 


, consisting  of  just  x.  Then 


move  back  to 


5 *• 


Do  1 :uie  a probability  measure  nv  over  the  dictionary.  Say 
that  the  true  part  i b .i  on  is 

; 'i  k -’  i] 

(10)  ^ cp  = {n^H.n  +;?,  . . . ,1^+1 

! 

• _ 

and  say  that  if  xgc,  , y€c,  then  the  tost  w;i  .IT  yi  vo  negative 
answer  with  a procofc  aiity  d..  . , dependin'-  o.niv  noon  the  class 

***  o 

index.  .Let  us  cheese 

d. . = 0 (this  is  not  necessary,  oerhaps) 

(11)  J — 

| 0 £ , < i 

and  let  us  also  make  u. . Into  a distance.  A simple  choice  would 


d..  . = 


Simulate  the  procedure  and  count  number  of  tests  until  true 
partition  has  been  reached.  The  mean  number  of  tests  is  an 
appropriate  number  of  computational  work  repaired  by  the  algorithm. 

i mail;/  a remark  about  a distance  measure  between  two  partitions 
described  by  the  incidence  matrices  M and  M * . Pick  two  elements  x and 
y at  random  and  lock  for  the  probability  that  (x  - y)  * (x  =y). 


borma  3.  This  urobabilitv  ir 


a dir.  twiner 


